Search CORE

18 research outputs found

Tracing Communications and Computational Workload in LJS (Lennard-Jones with Spatial Decomposition)

Author: Brodowicz Maciej
Publication venue: 'California Institute of Technology Library'
Publication date: 01/01/2003
Field of study

LJS (Lennard-Jones with Spatial decomposition) is a molecular dynamics application developed by Steve Plimpton at Sandia National Laboratories [1]. It performs thermodynamic simulations of a system containing fixed large number (millions) of atoms or molecules confined within a regular, three-dimensional domain. Since the simulations model interactions on atomic scale, the computations carried out in a single timestep (iteration) correspond to femtoseconds of the real time. Hence, a meaningful simulation of the evolution of the system's state typically requires a large number (thousands and more) of timesteps. The particles in LJS are represented as material points subjected to forces resulting from interactions with other particles. While the general case involves N-body solvers, LJS implements only pair-wise material point interactions using derivative of Lennard-Jones potential energy for each particle pair to evaluate the acting forces. The velocities and positions of particles are updated by integrating Newton's equations (classical molecular dynamics). The interaction range depends on the modeled problem type; LJS focuses on short-range forces, implementing a cutoff distance rc outside which the interactions are ignored. The computational complexity of O(N2), characteristic for systems with long-range interactions, is therefore substantially alleviated. LJS deploys spatial decomposition of the domain volume to distribute the computations across the available processors on a parallel computer. The decomposition process uniformly divides parallelepiped containing all particles into volumes equal in size and as close in shape to a cube as possible, assigning each of such formed cells to a CPU. The correctness of computations requires the positions of some particles (depending on the value of rc) residing in the neighboring cells to be known to the local process. This information is exchanged in every timestep via explicit communication with the neighbor nodes in all three dimensions (for details see [2]). LJS also takes the advantage of the third Newton's law to calculate the force only once per particle pair; if the involved particles belong to cells located on different processors, the results are forwarded to the other node in a "reverse communication" phase. Besides communications occurring in every iteration, additional messages are sent once every preset number of timesteps. Their purpose is to adjust cell assignments of particles due to their movement. To minimize the overhead of the construction of particle neighbor lists, LJS replaces rc with extended cutoff radius rs (rs > rc), which accounts for possible particle movement before any list updates need to be carried out. Due to a relatively small impact of that phase on the overall behavior of the application, we ignored it in our analysis

Caltech Authors

The "MIND" Scalable PIM Architecture

Author: Brodowicz Maciej
Sterling Thomas
Publication venue
Publication date: 01/01/2005
Field of study

MIND (Memory, Intelligence, and Network Device) is an advanced parallel computer architecture for high performance computing and scalable embedded processing. It is a Processor-in-Memory (PIM) architecture integrating both DRAM bit cells and CMOS logic devices on the same silicon die. MIND is multicore with multiple memory/processor nodes on each chip and supports global shared memory across systems of MIND components. MIND is distinguished from other PIM architectures in that it incorporates mechanisms for efficient support of a global parallel execution model based on the semantics of message-driven multithreaded split-transaction processing. MIND is designed to operate either in conjunction with other conventional microprocessors or in standalone arrays of like devices. It also incorporates mechanisms for fault tolerance, real time execution, and active power management. This paper describes the major elements and operational methods of the MIND architecture

Caltech Authors

Analysis, Tracing, Characterization and Performance Modeling of Select ASCI Applications for BlueGene/L Using Parallel Discrete Event Simulation

Author: Brodowicz Maciej
Brunett Sharon
Gottschalk T.D.
Springer Paul L.
Upchurch Ed
Publication venue: 'California Institute of Technology Library'
Publication date: 01/01/2003
Field of study

Caltech's Jet Propulsion Laboratory (JPL) and Center for Advanced Computer Architecture (CACR) are conducting application and simulation analyses of Blue Gene/L[1] in order to establish a range of effectiveness of the architecture in performing important classes of computations and to determine the design sensitivity of the global interconnect network in support of real world ASCI application execution

Caltech Authors

Continuum Computer Architecture for Nano-scale and Ultra-high Clock Rate Technologies

Author: Brodowicz Maciej
Sterling Thomas
Publication venue: IEEE Computer Science
Publication date: 01/01/2005
Field of study

Continuum computer architecture (CCA) is a non-von Neumann architecture that offers an alternative to conventional structures as digital technology evolves towards nano-scale and the ultimate flat-lining of Moore's law. Coincidentally, it also defines a model of architecture particularly well suited to logic classes that exhibit ultra-high clock rates (> 100 GHz) such as rapid single flux quantum (RSFQ) gates. CCA eliminates the concept of the "CPU" that has dominated computer architecture since its inception more than half a century ago and establishes a new local element that merges the properties of state storage, state transfer, and state operation. A CCA system architecture is a simple multidimensional organization of these elemental blocks and physically may be considered as a new family of cellular computer. But CCA differs dramatically from conventional cellular automata. While both deliver emergent global behavior from the aggregation of local rules and ensuing operation. The CCA emergent behavior is a global general-purpose model of parallel computation, as opposed to simply mimicking some limited phenomenon like heat and mass transfer as do conventional cellular automata. This paper presents the motivation and foundation concepts of CCA and exposes key issues for further work

Caltech Authors

Przegląd ogólny swojego zawodu lekarskiego i nauczycielskiego

Author: Brodowicz Maciej Józef (1790-1885)
Publication venue: DRUKARNIA UNIWERSYTETU JAGIELLOŃSKIEGO
Publication date: 01/01/1871
Field of study

Pomeranian Digital Library

MPI-IO Implementation Strategies for the Cenju-3

Author: Darren Sanders
Maciej Brodowicz Darren
Olin Johnson
Publication venue
Publication date
Field of study

The lack of a portable parallel I/O interface limits the development of scientific applications. MPI-IO is the first widespread attempt to alleviate this problem. Its efficient implementation requires the developer to face and solve several software design and interface issues. Our paper outlines strategies, which may be helpful in this task. Although originally targeted for the NEC Cenju-3, our considerations are applicable to other message-passing platforms as well. 1 Introduction. An initiative led by Nasa Ames and IBM Watson Research Center has resulted in creation of MPI-IO, which defines a portable interface for parallel I/O. Currently, MPI-IO is officially incorporated into MPI-2 [4], an ambitious extention of the original MPI. This paper summarizes our experiences with development of an MPI-IO system for the NEC Cenju-3 supercomputer. The Cenju-3 features a multi-level switch, NORMA architecture. Research supported by a grant from NEC Corporation. The Cenju/DE operating sy..

CiteSeerX